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Abstract — We  introduce  a  novel  adaptive  neuro-fuzzy  archi¬ 
tecture  based  on  the  framework  of  Multiple  Instance  Fuzzy 
Inference.  The  new  architecture  called  Multiple  Instance-ANFIS 
(MI-ANFIS),  is  an  extension  of  the  standard  Adaptive  Neuro 
Fuzzy  Inference  System  (ANFIS)  [1]  that  is  designed  to  handle 
reasoning  with  multiple  instances  (bags  of  instances)  as  input  and 
capable  of  learning  from  ambiguously  labeled  data.  In  multiple 
instance  problems  the  training  data  is  ambiguously  labeled. 
Instances  are  grouped  into  bags,  labels  of  bags  are  known  but  not 
those  of  individual  instances.  Multiple  Instance  Learning  (MIL) 
deals  with  learning  a  classifier  at  the  bag  level.  Over  the  years 
many  solutions  to  this  problem  have  been  proposed.  However, 
no  MIL  formulation  employing  fuzzy  inference  exists  in  the 
literature.  In  this  paper,  we  develop  MI-ANIFS  that  generalizes 
ANFIS  inference  systems  to  account  for  ambiguity  and  reason 
with  multiple  instances.  We  also  develop  a  learning  algorithm  to 
learn  the  parameters  of  MI-ANFIS.  The  proposed  MI-ANFIS  is 
tested  and  validated  using  a  synthetic  and  benchmark  data  sets 
suitable  for  MIL  problems. 

I.  Introduction 

The  standard  Adaptive  Neuro-Fuzzy  Inference  System 
(ANFIS)[1]  is  a  universal  approximator  that  combines 
the  learning  and  modeling  power  of  neural  networks  and 
fuzzy  logic  into  an  adaptive  inference  system.  Neural  networks 
deal  with  imprecise  data  by  training,  while  fuzzy  logic  can 
deal  with  the  uncertainty  of  human  cognition  [2],  ANFIS 
offers  an  alternative  to  rules’  identification.  While  Mamdani 
[3]  and  Sugeno  [4]  fuzzy  systems  identify  rules  based  on 
intuition,  ANFIS,  in  contrast,  jointly  learns  the  optimal  input 
space  partition  and  the  optimal  output  parameters  through 
optimization.  ANFIS  is  considered  a  hybrid  intelligent  system 
and  it  provides  a  systematic  approach  to  learn  fuzzy  rules  from 
a  given  input-output  dataset  using  supervised  learning. 
Typically,  in  supervised  learning  problems,  access  to  large 
labeled  training  datasets  improves  the  performance  of  the 
devised  algorithms  by  overcoming  noise  and  adding  robustness 
and  generalization  to  unseen  examples.  Even  though,  large 
amounts  of  data  are  available  and  could  be  used  for  learning,  in 
many  applications,  this  data  is  typically  labeled  ambiguously 
and  at  a  coarse  level.  In  fact,  labels,  or  tags,  tend  to  be  associ¬ 
ated  with  collections  of  samples  rather  than  single  samples.  For 
example,  in  image  annotation,  tags  could  be  used  as  indicators 
of  the  existence  of  objects  of  interests  within  the  images  (sky, 
sea,  beach,. . . ).  However,  the  exact  location  of  those  objects  is 
not  available  and  is  too  tedious  to  extract  for  large  collection 
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of  images.  An  alternative  and  a  relatively  new  framework 
of  learning  that  tackles  the  inherent  ambiguity  better  than 
supervised  learning,  is  the  Multiple  Instance  Learning  (MIL) 
paradigm  [5],  Unlike  standard  supervised  learning,  in  MIL, 
an  example  is  not  a  simple  data  point,  but  a  collection  of 
instances,  called  a  bag.  Each  bag  can  contain  a  different 
number  of  instances.  A  bag  is  labeled  negative  if  all  of  its 
instances  are  negative,  and  positive  if  at  least  one  of  its 
instances  is  positive1.  Positive  bags  can  encode  ambiguity  since 
the  instances  themselves  are  not  labeled.  Given  a  training  set 
of  labeled  bags,  the  goal  of  MIL  is  to  learn  a  concept  that 
predicts  the  labels  of  training  data  and  generalizes  to  predict 
the  labels  of  testing  bags  [6]. 

To  effectively  take  full  advantage  of  the  standard  ANFIS 
system  in  the  context  of  MIL,  bags  need  to  be  labeled  at  the 
instances  level  by  human  experts  to  make  learning  possible  [7]. 
Unfortunately,  this  process  is  tedious,  ambiguous,  subjective, 
and  prone  to  errors.  To  address  this  major  limitation,  we 
introduce  an  adaptive  nemo-fuzzy  architecture  that  is  designed 
to  handle  reasoning  with  bags  of  instances  as  input  and 
capable  of  learning  from  ambiguously  labeled  data.  The  new 
architecture  is  called  Multiple  Instance-ANFIS  (MI-ANFIS). 
The  rest  of  this  paper  is  organized  as  follows.  Section  II 
describes  the  architecture  of  the  proposed  MI-ANIFS,  and  a 
corresponding  learning  algorithm  is  introduced  in  Section  III. 
Section  IV  presents  the  experimental  results  on  a  synthetic 
and  benchmark  data  sets.  Finally,  we  provide  the  conclusions 
in  section  V. 

II.  MI-ANFIS  Architecture 

In  the  following,  let  Bp  be  a  bag  of  Mp  instances  with  the 
y th  instance  denoted  as  xpj.  xw  is  in  turn  a  D  dimensional 
vector  with  elements  xpjk  corresponding  to  features,  i.e.. 
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Note  that  the  number  of  instances  can  vary  between  bags  (Mp 
depends  on  Bp).  A  bag  is  labeled  positive  if  at  least  one  of 
its  instances  is  positive,  and  negative  if  all  of  its  instances  are 
negative. 

’Note  that  positive  bags  may  also  contain  negative  instances. 
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Fig.  1.  Architecture  of  the  proposed  multiple  instance  Adaptive  Neuro-Fuzzy  Inference  System 


We  introduce  our  MI-ANFIS  for  the  simple  case  of  two  rules. 
Equation  (1)  describes  the  MI-ANFIS  with  two  Sugeno  rules. 
Here,  A'k  is  a  fuzzy  set  associated  with  the  fcth  instance  feature, 
and  “\/”  is  a  joint  operator  that  can  be  any  T-conorm  (or,  max, 
sum,  etc.),  b*  =  6g, ...,  blD  is  a  set  of  polynomial  coefficients. 
The  premise  part  of  the  rule  is  evaluated  as  in  the  ANFIS 
case.  To  evaluate  the  consequent  part,  first  the  linear  response 
of  each  instance  is  computed,  i.e.,  xpj  ■  b'.  Then,  a  function  C 
is  used  to  compute  the  final  output  by  combining  the  instances’ 
responses.  Many  functions  could  be  used  and  the  choice  should 
be  domain-specfic.  For  instance,  the  “max”  function  has  been 
used  in  many  applications. 

Figure  1  illustrates  the  proposed  MI-ANFIS  architecture,  the 
upper  part  and  lower  part  of  the  network  correspond  to  the  first 
and  second  fuzzy  rules.  As  in  the  traditional  ANFIS,  nodes  at 
the  same  layer  have  similar  functions.  We  denote  the  output 
of  the  ith  node  in  layer  l  as  O/  , 

Layer  1  is  an  adaptive  layer,  it  calculates  the  degree  to 
which  a  given  input  instance  satisfies  a  quantifier 
A.  Every  node  evaluates  the  membership  degree 
of  an  input  instance  in  the  fuzzy  set  A^j  of 
membership  function  .  Generally,  jiAk  ,  is 
a  parameterized  membership  function  (MF),  for 
example  a  Gaussian  MF  with 

VAkJ{x)  =  exp{  ^  Cfc^  ).  (3) 

2crfe/ 

In  (3),  Ckj  and  (jfcj  are  the  mean  and  variance  of 
the  gaussian  function,  and  are  referred  to  as  the 

premise  parameters. 


Layer  2  is  a  fixed  layer,  every  node  computes  the  product 
of  all  incoming  inputs.  In  the  context  of  multiple 
instance  fuzzy  logic,  layer  2  evaluates  the  degree 
of  truth  of  proposition  instances,  or  simply,  “truth 
instances”.  The  output  of  this  layer  is 
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where  |~]  is  a  ceiling  operator,  and  i[Mp]  is  i  mod 
Mp.  As  in  the  traditional  ANFIS,  any  T-norm  can 
be  used  as  the  node  function  in  this  layer. 

Layer  3  is  a  new  addition  when  compared  to  the  tradi¬ 
tional  ANFIS.  Every  node  aggregates  the  truth 
instances  of  the  previous  layer  by  means  of  a 
smooth  T-conorm.  In  this  paper,  we  use  a  smooth 
approximation  of  the  “max”  T-conorm  known  as 
the  “softmax”  function  (iS>Q): 


softmaxa{ Xi,X2,  ■  ■  ■ ,  xn )  = 

&a(x  1,  X2j  ■  •  •  ;  Xn')  ^  -  (5) 

^  Ej=1  e  3 

In  (5),  as  pointed  by  Maron  in  [8],  the  parameter 
a  determines  how  closely  softmax  approximates 
the  max  operator.  As  a  approaches  oo  ,  softmax’ s 
behavior  approaches  max.  When  a  =  0,  it  calcu¬ 
lates  the  mean.  As  a  approaches  — oo,  softmax’s 
behavior  approaches  the  min  operator. 

The  outputs  of  this  layer  are  the  firing  strength  of 


the  multiple  instance  fuzzy  rules  defined  by  layers 
1  through  layer  3.  i.e., 

03,i  =  Wi  =  (6) 


where  a  is  a  fixed  constant.  Layer  3  is  also  a  fixed 
layer. 

Layer  4  is  a  fixed  layer.  Every  node  labeled  N  of  this  layer 
calculates  the  normalized  firing  strength  of  each 
rule: 
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where  |03 1  is  the  number  of  rules. 

Layer  5  is  an  adaptive  layer.  Every  node  i  in  this  layer 
computes  the  output  of  the  i’ th  multiple  in¬ 
stance  rule.  Because  our  MI-ANFIS  is  function¬ 
ally  equivalent  to  the  a  multiple  instance  Sugeno 
fuzzy  inference  system,  the  output  of  each  rule 
will  be  computed  using  the  combining  function 


05,i  =  C {-x.pl  ■  h\  xp2  •  b*, . . . ,  xpMp  ■  bl),  (8) 

where  xpj  =  . . .  ,xPti,D}  for  j  = 

1  and  b'  =  {6q,  ...,  is  a  set  of 

polynomial  coefficients.  The  parameters 
are  referred  to  as  the  consequents  parameters. 
The  only  constraint  on  C  is  it  has  to  be  a  smooth 
function  to  allow  for  optimization  techniques  to  be 
applied.  In  the  following,  we  choose  “softmax”  as 
the  combining  function  for  this  layer.  In  this  case 
(8)  is  equivalent  to: 


measure  commonly  used  in  the  backpropagation  algorithm  and 
defined  as 

Ep  =  (tp-Op)2,  (11) 

In  (11),  tp  is  the  desired  bag  output,  and  Op  is  the  computed 
output  of  the  network  when  presented  with  training  bag  p. 
Equation  (11)  demonstrates  the  need  for  MI-ANFIS.  In  fact, 
due  to  the  absence  of  instances’  labels,  errors  can  be  computed 
only  at  the  bag  level.  Errors  at  the  instance  level  cannot  be 
computed  and  are  not  needed  as  we  will  show  later. 

The  overall  error  measure  of  the  network  is 

N 

E  =  YjEp.  (12) 

p=  1 

To  develop  the  gradient  descent  optimization  on  E,  we 
compute  the  error  rate  for  the  pth  training  bag  and  for  each 
node  output  Oi/,.  This  error  rate  £/.;  (1  <  l  <  6  indicates  the 
MI-ANFIS  layer)  is  defined  as 


£/,z 


dEp 
dOi,i  ■ 


(13) 


The  error  rate  at  the  output  node  is  given  as  following 


dEp  _  dEp 
dOe,  1  _  dOp 


-2  {tp-Op). 


(14) 


For  non-output  nodes  (i.e.  internal  nodes,  l  <  6),  we  derive 
the  error  rate  using  the  chain  rule 


05,i  =  WiSa{xp  1  •  b\  xp2  •  b\  . . . ,  xpMp  •  b1), 

(9) 

note  that  the  constant  a  here  is  not  necessary  the 
same  as  in  Layer  3. 

Layer  6  is  a  fixed  layer  with  a  single  node  labeled  E.  As 
in  the  traditional  ANFIS,  it  computes  the  overall 
output  of  the  system  using 

|03| 

06,1  =  = 

i— 1 

\o3\ 

WiSa{xp  1  •  b\  xp2  b\  . . . ,  xpMp  ■  b'  ). 

i=  1 

(10) 


III.  Basic  Learning  Algorithm 

To  identify  the  parameters  of  the  proposed  MI-ANFIS 
network,  we  propose  a  variation  of  the  basic  learning  algorithm 
presented  by  Jang  [9].  Our  variation  is  different  from  the 
ANFIS  standard  backpropagation  learning  rule  due  to  the 
additional  layers  our  network  introduced,  as  well  as  the  use  of 
new  activation  functions  at  the  nodes  level,  such  as  “softmax”. 

A.  BackPropagation  Learning  Rule 

In  the  following,  we  assume  that  we  have  N  training  bags, 
B  =  {Bp  |  p  =  1,  and  their  corresponding  labels 

T  ={tp\p=l,...,N}. 

First,  for  the  pth  training  bag,  we  compute  a  squared  error 


_  dEp_  _  Car^i+1)  dEp  dOi+hh 
dOiyi  dOi+i'h  dOiti 

where  Card{l  +  1)  refers  the  number  of  nodes  at  layer  l  +  1. 
Next,  we  seek  to  minimize  the  network  error  with  respect  to 
the  premise  parameters  {ckj^crkj  \  1  <  k  <  | O3 1 , 1  <  j  <  D }, 
and  with  respect  to  consequents  parameters  {b'‘}|:£{L 
The  error  rate  with  respect  to  a  generic  parameter  9  can  be 
computed  using 

dEp  _  dEp  dO* 

09  ^  ^  dO*l)f'  ^  ; 

o*es 

where  S  is  the  set  of  nodes  whose  outputs  depend  on  9. 
Using(12),  the  total  error  rate  is  given  by 


dE 

~d9 


sr'  dEp 


V- 1 


(17) 


1 )  Update  Rule  For  Premise  Parameters:  First  we  compute 
the  error  rate  for  the  premise  parameters  Ckj  and  Okj-  We  have 

dEp  = 

dckj 

Mp  „  _ 

_ dEp _ dU(i:i+{(k-i)D+(j-i)}Mp) 

d0(l,i+l(k-l)D+ti-l)]Mp)  d°kj 


(18) 


and. 


And  the  update  formula  for  akj  is  as  follows 


A  <7kj  =  —T)~ 


E9EP  dO(x  ,i+[(fc_i)D+(j_i)]M )  .  . 

— - - - - -  - — — - — — .  where  77  is  the  same  learning  rate  as  in  (21) 

i=1  do(i,i+[(k-i)D+(j-i)\Mp)  dakJ 

(19)  Equations  (21)  and  (23)  can  be  used  to  update  ckj  and  akj 

parameters  either  on-line,  bag  by  bag  (  we  want  to  emphasis 
Using  the  chain  rule  defined  in  (15),  it  can  be  shown  that  here  that  the  on-line  learning  is  not  achieved  instance  by 

instance,  but  rather  bag  by  bag),  or  after  presentation  of  the 
entire  data  set.  This  later  mode  of  learning  is  known  as  batch- 
dEp  learning  or  off-line  learning.  Next,  we  develop  the  update  rules 

()<;kj  ~  for  the  consequents  parameters. 

^|03 1 
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where. 
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(i+(fc-l)Mp)[Mp],d 
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for  j  =  (25) 


x  exp(— 


(x(p,(i+(k—l)Mp)[Mp],j)  ckj) 


)  •  (20) 


As  in  the  standard  ANFIS,  an  update  formula  for  the  parameter 
ckj  is  given  by 

dE 

Ackj  =  -77- — ,  (21) 

dckj 

where  77  is  a  learning  rate  determined  in  a  similar  manner  to 
that  of  standard  backpropagation  algorithm  [9], 

The  update  formula  for  akj  can  be  derived  in  a  similar 
manner.  It  can  be  shown  that 


Using  the  previously  defined  chain  rule  in  (15),  it  can  be 
shown  that  the  overall  error  rate  with  respect  to  the  consequent 
parameter  b)  is  given  according  to  (17)  as  follows 
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Hence,  the  update  formula  for  consequent  parameter  b \ 


e^fc,(i+(fc-l)Mp) 
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d=l,d^j  M  (i+(fe-l)Mp)/Mp  | 
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(a;(p,(i+(fe-l)Mp)[Mp]  j)  ckj) 


x  exp(— 
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where  17  is  the  same  learning  rate  as  in  (21) 

)[Mp],d)  Equation  (27)  will  be  used  to  update  64  either  on-line  or 

off-line,  depending  on  the  MI-ANFIS  implementation. 

So  far,  we  have  derived  all  necessary  update  formulas  for 
the  MI-ANFIS  premise  and  consequent  parameters.  Next,  we 
present  our  MI-ANFIS  basic  learning  algorithm.  It  is  an  itera- 
)  •  (22)  tive  algorithm  that  involves  successive  updates  of  the  premise 

and  consequent  parameters.  It  is  summarized  in  Algorithm  1. 


Algorithm  1  MI-ANFIS  Basic  Learning  Algorithm 

Inputs:  B:  the  set  of  training  bags. 

T :  the  set  of  training  labels. 

M:  the  number  of  instances  in  each  bag. 
a:  the  constant  used  in  the  “softmax”  function. 

77:  the  learning  rate, 
e:  number  of  epochs. 

Outputs:  b':  the  sets  of  consequent  parameters. 

c*:  the  set  of  membership  functions’  centers. 
a1:  the  set  of  membership  functions’  widths. 

Initialize  b\  c\  and  a'  . 

repeat 

Update  b*  using  (27). 

Update  c*  using  (21). 

Update  a1  using  (23). 

until  parameters  do  not  change  significatively  or  number  of 
epochs  is  exceeded 

return  b\  c\  a1 


IV.  EXPERIMENTAL  RESULTS 

In  the  following,  we  report  on  the  experiments  conducted 
to  validate  the  proposed  MI-ANFIS.  First,  we  use  a  synthetic 
data  set  to  show  the  potential  of  MI-ANIFS  to  learn  concepts 
from  ambiguously  labeled  data.  Later,  we  apply  our  method 
to  benchmark  data  sets  commonly  used  in  MIL  problems  and 
report  results. 

A.  Synthetic  Data 

We  use  a  simple  synthetic  dataset  to  illustrate  the  potential 
of  using  MI-ANIFS  to  learn  concepts  from  ambiguously 
labeled  data.  For  this  purpose,  we  generated  a  synthetic  dataset 
from  a  distribution  of  two  positive  concepts,  marked  with  black 
and  red  squares  in  Figure  2  (the  concept  points  are  unknown 
to  MI-ANFIS).  From  each  positive  concept  we  generated  50 
bags.  We  also  generated  50  negative  bags  randomly  from  non 
concept  regions.  Each  bag  has  up  to  10  instances.  The  data 
is  shown  in  Figure  2.  Instances  from  negative  bags  are  shown 
as  blue  letters,  and  instances  from  positive  bags  are  shown 
in  red  or  black  letters  depending  on  the  underlying  concept. 
Instances  from  the  same  bag  are  displayed  using  the  same 
letter.  In  Figure  2,  we  highlight  one  bag  from  Concept  1  by 
circling  all  of  its  instances.  As  it  can  be  seen,  one  instance 
is  close  to  the  dense  red  region  (positive  concept)  while  the 
other  instances  are  scattered  around.  Positive  bags  are  assigned 
a  labeled  of  1,  and  negative  bags  are  labeled  with  zeros. 

In  the  following,  for  the  purpose  of  demonstration  we  apply 
only  update  equations  of  the  premise  parameters  during  the 
training  epochs,  and  show  that  the  MI-ANFIS  Basic  Learning 
Algorithm  (Algorithm  1)  is  capable  of  identifying  positive 
concepts  as  well  as  their  corresponding  multiple  instance  fuzzy 
rules.  To  initialize  the  premise  parameters,  we  use  the  FCM 
[10]  algorithm  to  partition  the  instances’  space  into  4  clusters2. 
We  use  the  clusters’  centers  as  initial  centers  for  the  Gaussian 
MFs,  and  we  initialize  all  standard  deviation  parameters  to  a 
default  value  of  0.5. 


-A  grid  or  manual  partitioning  could  also  be  used 


Fig.  2.  Instances  from  positive  and  negative  bags  drawn  from  data  that  have 
2  concepts 


The  initial  fuzzy  sets  (MFs)  of  the  rules’  premise  parts  be¬ 
fore  training  are  displayed  in  Figure  3(a).  Updated  parameters 
after  20  training  epochs  are  shown  in  Figure  3(b),  and  learned 
fuzzy  sets  after  convergence  are  shown  in  Figure  3(c). 

As  it  can  be  seen,  the  algorithm  has  identified  the  two 
true  concepts  showing  that  MI-ANFIS  can  efficiently  learn 
from  partially  labeled  data.  More  importantly,  the  system  has 
correctly  identified  the  positive  concepts,  and  at  the  same 
time  identified  irrelevant  rules  (MI-Rule  1  and  MI-Rule  3 
marked  with  red  crosses  in  Figure  3(c)).  After  training,  it 
is  recommended  to  detect  and  prune  such  rules  to  improve 
MI-ANFIS  testing  efficiency.  This  can  be  achieved  by  setting 
a  minimum  acceptable  fuzzy  sets  support  below  which  rules 
containing  the  set  are  considered  irrelevant. 

B.  Benchmark  Datasets 

To  provide  a  qualitative  evaluation  of  the  proposed  MI- 
ANFIS,  we  apply  it  to  five  benchmark  data  sets  commonly 
used  to  evaluate  MIL  methods.  The  data  sets  are  namely  the 
MUSK1,  MUSK2  [11],  and  Fox,  Tiger,  and  Elephant  from  the 
COREL  data  set  [12].  MUSK1  and  MUSK2  data  sets  consist 
of  descriptions  of  molecules  and  the  object  is  to  classify 
whether  a  molecule  smell  musky  [13].  In  these  data  sets, 
each  bag  represents  a  molecule.  Instances  in  a  bag  represent 
the  different  low-energy  conformations  of  the  molecule.  Each 
instance  consists  of  166  features.  MUSK1  has  92  bags,  of 
which  47  are  positive,  and  MUSK2  has  102  bags,  of  which 
39  are  positive.  The  other  data  sets  from  COREL:  Fox, 
Tiger,  and  Elephant,  classify  whether  an  image  contains  the 
corresponding  animal.  Each  data  set  consists  of  200  images 
(bags):  100  positive  images  containing  the  target  animal  and 
100  negative  images  containing  other  animals.  Each  image  is 
represented  as  a  set  of  patches  (instances)  and  each  patch  is 
in  turn  represented  by  230  features  describing  color,  texture 
and  shape  information.  Table  I  summarizes  the  characteristics 
of  the  five  data  sets.  It  is  to  be  noted  that  for  each  benchmark 
data  set,  PCA  was  applied  to  reduce  the  dimensionality  of  the 
features  in  order  to  speedup  MI-ANFIS  training  and  increase 
the  interpretability  of  the  generated  multiple  instance  fuzzy 
rules. 

For  all  experiments,  we  construct  a  zero-order  MI-ANFIS 


TABLE  II. 


MI-ANFIS  Training  Parameters 


(b)  Input  MFs  during  MI-ANFIS  training  (Epoch  number  20). 


X 

X 


Parameter 

MUSK1 

MUSK2 

FOX 

Tiger 

Elephant 

No.  of  MI  Rules 

15 

15 

15 

15 

15 

No.  of  Inputs 

25 

25 

10 

10 

10 

MF’s  <7 

100 

100 

100 

10 

10 

Output  parameters 

Is 

Is 

Is 

Is 

Is 

Softmax’s  a 

1 

1 

1 

1 

1 

Learning  rate 

0.1 

0.1 

0.1 

0.1 

0.1 

TABLE  III.  Comparison  of  MI-ANFIS  prediction  accuracy  (in 

PERCENT)  TO  OTHER  METHODS  ON  THE  BENCHMARK  DATA  SETS. 

Results  for  3  top  performing  methods  are  shown  in  bold  font. 


Algorithms 

MUSK1 

MUSK2 

Fox 

Tiger 

Elephant 

MI-ANFIS 

93.49 

90.58 

66.4 

84.5 

86.97 

±0.76 

±1.31 

±2.77 

±0.61 

±1.10 

MILES  [14] 

86.3 

87.7 

N/A 

N/A 

N/A 

APR  [11] 

92.4 

89.2 

N/A 

N/A 

N/A 

DD  [8] 

88.9 

82.5 

N/A 

N/A 

N/A 

DD-SVM  [15] 

85.8 

91.3 

N/A 

N/A 

N/A 

EM-DD  [16] 

84.8 

84.9 

56.1 

72.1 

78.3 

Citation-KNN  [17] 

92.4 

86.3 

N/A 

N/A 

N/A 

MI-SVM  [12] 

77.9 

84.3 

57.8 

84.0 

81.4 

mi-SVM  [12] 

87.4 

83.6 

58.2 

78.4 

82.2 

MI-NN  [18] 

88.0 

82.0 

N/A 

N/A 

N/A 

Bagging- APR  [19] 

92.8 

93.1 

N/A 

N/A 

N/A 

RBF-MIP  [20] 

91.3 

90.1 

N/A 

N/A 

N/A 

±1.6 

±1.7 

BP-MIP  [21] 

83.7 

80.4 

N/A 

N/A 

N/A 

RBF-Bag-Unit  [22] 

90.3 

86.6 

N/A 

N/A 

N/A 

Mi-kernel  [23] 

88.0 

89.3 

60.3 

84.2 

84.3 

PPPM-kernel  [24] 

95.6 

81.2 

60.3 

80.2 

82.4 

MIGraph  [23] 

90.0 

90.0 

61.2 

81.9 

85.1 

miGraph  [23] 

88.9 

90.3 

61.6 

86.0 

86.8 

ALP-SVM  [25] 

86.3 

86.2 

66.0 

86.0 

83.5 

MIForest  [26] 

85.0 

82.0 

64.0 

82.0 

84.0 

Naive-ANFIS 

67.82 

79.43 

N/A 

N/A 

N/A 

±4.04 

±5.04 

(c)  Learned  MFs  after  convergence  of  MI-ANFIS  training  algorithm.  Rules 
marked  with  red  crosses  are  considered  vanished  and  are  pruned.  Remaining 
rules  (MI-Rule  2  and  MI-Rule  4)  correctly  describe  the  positive  concepts  of  the 
dataset 

Fig.  3.  Input  MFs  before,  during,  and  after  MI-ANFIS  training. 


TABLE  I.  Benchmark  data  sets 


Data  set 

dim.(PCA) 

No.  Bags 

Positive 

Negative 

No.Instances 

MUSK1 

166(25) 

92 

47 

45 

2  -)•  40 

MUSK2 

166(25) 

102 

39 

63 

1  ->•  1044 

Fox 

230(10) 

200 

100 

100 

2  -)•  13 

Tiger 

230(10) 

200 

100 

100 

1  -)•  13 

Elephant 

230(10) 

200 

100 

100 

2  ->•  13 

(constant  consequent  parameters)  having  15  multiple  instance 
rules,  and  employing  Gaussian  MFs  to  describe  the  input 
fuzzy  sets.  For  initialization,  we  use  the  FCM  algorithm  to 
cluster  the  instances  of  the  positive  bags  into  15  clusters,  and 
we  initialize  MFs’  centers  as  the  clusters  centers.  Table  II 
summarizes  all  parameters  used  in  training  the  MI-ANFIS. 
We  note  that  the  reason  behind  using  large  standard  deviations 
For  MUSK1,  MUSK2,  and  FOX  datasets  is  to  allow  the 


initial  rules  to  cover  the  entirety  of  the  input  space. 

After  initialization,  we  run  MI-ANFIS  basic  learning  al¬ 
gorithm  (Algorithm  1)  to  jointly  learn  a  fuzzy  description  of 
positive  concepts  as  well  as  the  optimal  multiple  instance  rules’ 
output. 

Table  III  shows  the  performance  of  the  proposed  algorithm 
on  the  benchmark  data  sets.  MI-ANFIS  was  trained  and  tested 
using  ten  fold  cross  validation.  The  performance  is  reported 
in  terms  of  prediction  accuracy  (%  of  correct  ±  standard 
deviation). 

To  show  the  advantage  of  using  MI-ANFIS  over  the  tradi¬ 
tional  ANFIS  we  compare  its  performance  to  the  later  on  the 
benchmark  data  sets.  Given  that  ANFIS  cannot  learn  from 
ambiguously  labeled  data,  for  sake  of  comparison,  we  consider 
the  naive  MIL  assumption  where  all  instances  from  positive 
bags  are  considered  positive  and  all  instances  from  negative 
bags  are  considered  negative.  We  refer  to  this  implementation 
as  Naive-ANFIS.  An  empirical  comparison  with  other  MIL 
methods  is  also  reported. 

Overall,  MI-ANFIS  achieved  state  of  the  art  performances. 
On  all  tested  data  sets,  MI-ANFIS  ranked  consistently  among 


the  top  three.  For  MUSK1,  PPPM-kernel  [24]  performed  the 
best  (95.6%),  but  did  not  perform  as  well  for  the  other  sets. 
For  MUSK2  Bagging-APR  [19]  achieved  the  best  accuracy,  as 
reported  by  [14],  Bagging-APR  excellent  performance  is  cred¬ 
ited  to  the  use  of  an  ensemble  scheme  to  the  base  learner  APR 
[11],  MI-ANFIS  achieved  the  best  average  performance  for  the 
Fox  and  Elephant  data  sets,  and  second  best  performance  after 
the  miGraph  [23]  and  ALP-SVM  [25]  methods  for  the  Tiger 
data  set.  It  is  clear  from  Table  III  that  Naive- ANFIS  performed 
the  worse,  this  is  basically  due  to  the  naive  MIL  assumption. 
In  cases  where  more  information  about  instances  is  available, 
such  information  could  be  used  to  relax  the  naive  assumption 
by  assigning  better  labels  at  the  instances’  level,  and  could 
lead  to  better  ANFIS  (standard)  performance. 

C.  Discussion 

Fuzzy  logic  is  powerful  at  modeling  knowledge  uncertainty 
and  measurements  imprecision  [27],  More  generally,  it  is 
one  of  the  best  frameworks  to  model  vagueness.  However, 
in  addition  to  uncertainty  and  imprecision,  there  is  a  third 
vagueness  concept  that  fuzzy  logic  does  not  address  quiet  well. 
This  vagueness  concept  is  due  to  the  ambiguity  that  arises 
when  the  data  have  multiple  forms  of  expression,  this  is  the 
case  for  multiple  instance  problems.  MI-ANFIS  deals  with 
ambiguity  by  introducing  the  novel  concept  of  truth  instances: 
when  carrying  reasoning  using  a  bag  of  instances  at  Layer 
2  (Figure  1),  a  proposition  will  not  only  have  one  degree 
of  truth,  it  will  have  multiple  degrees  of  truth  (r,j),  we  call 
truth  instances.  Thus,  effectively  encoding  the  third  vagueness 
component  of  ambiguity  and  increasing  the  expressive  power 
of  traditional  fuzzy  logic. 

Learning  positive  concepts  from  ambiguously  labeled  data  has 
been  the  core  task  of  various  MIL  algorithms  (e.g.  Diverse 
Density  [8]).  MI-ANFIS  has  proven  that  it  can  learn  positive 
concepts  effectively  while  jointly  providing  a  fuzzy  represen¬ 
tation  of  such  regions.  The  fuzzy  representation  is  combined 
into  meaningful  and  simple  multiple  instance  rules  that  can 
be  easily  visualized  and  interpreted.  The  fuzzy  representation 
also  offers  the  advantage  of  robustness  against  noise  points 
that  might  happen  to  be  close  to  positive  concepts  without 
being  necessarily  positive.  Thus,  lowering  the  amount  of  false 
positives.  MI-ANFIS  is  fully  independent.  It  does  not  require 
positive  concepts  to  be  learned  using  a  different  algorithm  (e.g. 
Diverse  Density  [8]),  or  based  on  intuition.  Moreover,  MI- 
ANFIS  does  not  rely  on  any  traditional  MIL  algorithms  and 
can  learn  its  rule  base  from  data. 

V.  Conclusions 

In  this  paper,  we  presented  MI-ANFIS,  an  novel  neuro- 
fuzzy  architecture  that  extends  the  standard  Adaptive  Neuro- 
Fuzzy  Inference  System  (ANFIS)  to  reason  with  bags  of 
instances  in  order  to  solve  multiple  instance  learning  problems. 
We  developed  a  BackPropagation  learning  algorithm  using  a 
thoroughly  and  abstract  mathematical  formulation  and  showed 
that  the  proposed  system  is  capable  of  learning  meaningful 
concepts  from  ambiguously  labeled  data.  We  reported  on  the 
performance  of  the  proposed  algorithm  using  a  synthetic  and 
five  benchmark  data  sets,  in  different  scenario  MI-ANFIS 
showed  promising  results. 

In  future  work,  we  intend  to  develop  a  hybrid  learning  al¬ 
gorithm  that  combines  a  gradient  method  and  a  least  squares 


estimator  (LSE),  in  order  to  speedup  MI-ANFIS  training.  We 
will  also  report  on  the  complexity  of  the  developed  training 
algorithms. 
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